In order to deal with CJKV variation characters, additional variation selectors were introduced into the Supplementary Special-purpose Plane of the Unicode. The newly introduced selectors are named variation selector-17 (U+E0100) to variation selector-256 (U+E01EF).

Despite having been introduced early in Unicode 4.0 which was published in 2004, these variation selectors still are not widely supported by modern computer systems.

A minimized font is created to test the behaviour of the font systems: GTK3, Qt6, Windows 7 to 10, macOS Big Sur.

The minimized font

glyphs

The glyphs in the font:

  • A (U+0041) character A

  • B (U+0042) character B

  • Z (U+005A) character Z

  • a (U+0061) character a

  • b (U+0062) character b

  • c (U+0063) character c

a special one:

  • = (U+0064) character d

Note that the glyph associated with character 'd' is '=' here!

glyphs

a special one again:

  • @ (U+0064 U+E0100) character d variation 17

Note that the glyph associated with character 'd variation 17' is '@' here!

ligature rules

The ligature rules defined in the font:

  1. 'a + U+E0100' will be ligatured to 'A'

  2. 'b + U+E0100' will be ligatured to 'B'

  3. 'c + c' will be ligatured to 'Z'

As the glyph of character 'd variation 17' is '@', and the glyph of character 'd' is '=', there are two implicit rules:

  1. 'd' will be transformed to '='

  2. 'd + U+E0100' will be ligatured to '@'

How the font works in real systems

A test text file is created and its content is listed below. The sentences after the ';' symbol are comments:

U+0061 U+0061 U+E0100  ; a, a variation17
U+0062 U+0062 U+E0100  ; b, b variation17
U+265F U+FE0E          ; emoji chess, text version
U+265F U+FE0F          ; emoji chess, graph version
U+0031 U+20E3          ; 1, enclosing keycap
U+0032 U+20E3          ; 2, enclosing keycap
U+0063 U+0063          ; c, c
U+0064 U+E0100         ; d, variation17
U+0064                 ; d

Qt6

Qt6

As shown in the image, while the code of ligatured combining character is of less than U+10000, the ligaturing works. Other than that, neither ligaturing(line 1, 2) and directly assigned variation selector(the 8th line, i.e. d, variation17) works. Qt6 deals with U+E0100 as if it does not exist.

The font process system in Qt6 is inadequate.

GTK3

GTK3

As shown in the image, both ligaturing(line 1, 2) and directly assigned variation selector(the 8th line, i.e. d, variation17) work.

Perfect!

Windows 7 - 10

Windows 7 - 10

As shown in the image, while the code of ligatured combining character is of less than U+10000, the ligaturing works. Other than that, only directly assigned variation selector(the 8th line, i.e. d, variation17) works.

Not perfect, but we can always use directly assigned variation selector to workaround the problem.

macOS Big Sur

macOS Big Sur

As shown in the image, while the code of ligatured combining character is of less than U+10000, the ligaturing works. Other than that, the system strips the ligatured selector and fallbacks the stripped character to another font. The 9th line, character 'd' is shaped as '=', that’s expected; but the 8th line, character 'd variation17' is neither shaped as '=' or '@', it fallbacks to 'd' which is obviously borrowed from another font. What a mess!

A disaster. Pooh.

Cross-platform ancient CJKV processing system

To build a cross-platform ancient CJKV processing system, the first step is to eliminate the macOS system from the cross-platform list. Ligaturing should also be avoided while recording variation characters, as glyphs with directly assigned variation selector work out-of-the-box in Windows and GTK based GUI systems, and as its data format is so straightforward, no gsub tables, groups, rules, we can simply write an auxiliary routine to enhance the Qt library to deal with them.

I once wrote a Qt based input method to input ancient Chinese variations. In that system, the input method looks up glyphs by the codepoint and selector from the loaded font manually, and then draws the glyph to the virtual painting device of Qt widgets manually. That’s how the auxiliary routine works.

- ZAN DoYe


Comments

comments powered by Disqus

© 2025 ZAN DoYe